Efficiently extracting full parse trees using regular expressions with capture groups

نویسندگان

  • Niko Schwarz
  • Aaron Karper
  • Oscar Nierstrasz
چکیده

5 Regular expressions with capture groups offer a concise and natural way to define parse trees over the text that they are parsing, however classical algorithms only return a single match for each capture group, not the full parse tree. We describe an algorithm based on finite-state automata that extracts full parse trees from text in Θ(nm) time and Θ(dn+m) space (where n is the size of the text, m the size of the pattern, and d the number of groups in the pattern). It is the first to do so in a single pass with complete control over greediness. This allows the algorithm to process streaming data using all constructs familiar to users of regular expressions. 6

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Regular Expressions with Context Operators and Parse Extraction

Regular expressions are used in many applications to specify patterns because any regular expression can be compiled into a very efficient one-pass pattern matcher called a finite automaton. Finding matches is useful, but even more useful is parse extraction, which describes in detail how a pattern matches some input. After matching an address, for example, parse extraction makes it easy to fin...

متن کامل

A Computational Interpretation of Context-Free Expressions

We phrase parsing with context-free expressions as a type inhabitation problem where values are parse trees and types are contextfree expressions. We first show how containment among context-free and regular expressions can be reduced to a reachability problem by using a canonical representation of states. The proofs-as-programs principle yields a computational interpretation of the reachabilit...

متن کامل

Parsing Strings and Trees with Parse::Eyapp (An Introduction to Compiler Construction)

Parse::Eyapp (Extended yapp) is a collection of modules that extends Francois Desarmenien Parse::Yapp 1.05. Eyapp extends yacc/yapp syntax with functionalities like named attributes, EBNF-like expressions, modifiable default action, automatic syntax tree building, semi-automatic abstract syntax tree building, translation schemes, tree regular expressions, tree transformations, scope analysis su...

متن کامل

Yacc is dead

We present two novel approaches to parsing context-free languages. The first approach is based on an extension of Brzozowski’s derivative from regular expressions to context-free grammars. The second approach is based on a generalization of the derivative to parser combinators. The payoff of these techniques is a small (less than 250 lines of code), easy-to-implement parsing library capable of ...

متن کامل

Unsupervised Tree Induction for Tree-based Translation

In current research, most tree-based translation models are built directly from parse trees. In this study, we go in another direction and build a translation model with an unsupervised tree structure derived from a novel non-parametric Bayesian model. In the model, we utilize synchronous tree substitution grammars (STSG) to capture the bilingual mapping between language pairs. To train the mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PeerJ PrePrints

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2015